Conditional coefficient for extremal graphical models

Let \(V = \{1, \dots, d\}\) the set of nodes of a graph \(\mathcal G=(V,E)\).

We want to define a new coefficient used for characterize the conditional dependence to apply for a pairwise Markov property in the extremal graphical model setting (Engelke and Hitz 2020).

Extremal/Asymptotical independence

Let \(X\) a random \(d\)-dimensional vector where the marginal distribution are standard Frechet.

Case \(d=2\)

One famous quantity in extreme value theory is the bivariate extremal coefficient which is defined as follow :

\[ \chi_{1,2} = \lim_{q\rightarrow 1} \mathbb P(F(X_1) >q ~|~ F(X_2)>q) = \lim_{q\rightarrow 1}\chi_{1,2}(q) \] if the right-hand term exists, where \(F\) is the distribution function of a standard Frechet.

We say that \(X_1\) and \(X_2\) are asymptotically independent if \(\chi_{1,2} = 0\) and asymptotically dependent otherwise.

If \(X\) is in the attraction domain of generalized Pareto distribution \(Y\) of exponent measure \(\Lambda\), there is an equivalence between :

\[ \chi_{1,2} = 0 \Longleftrightarrow \Big(\forall x=(x_1,x_2)>0 ,~H(x) = H_1(x_1) +H_2(x_2)\Big) \] where \(H(y) = \Lambda\big[(\mathbb R_+)^2 \setminus [0,y]\big]\).

And we know in (Strokorb 2020) that is equivalent to the extended extremal independence \(Y_1 \perp \!\! \perp_e Y_2\).

Case \(d>2\)

For some dimension \(d\), we have similar results. Let \(A\) and \(B\) a partition of \(V\). With the extended extremal independence of (Strokorb 2020), one can show a similar equivalence as above where :

\[ Y_A \perp \!\! \perp_e Y_B \Longleftrightarrow \Big( \forall x=(x_A,x_B)>0 ,~H(x) = H_A(x_A) + H_B(x_B) \Big) \]

We can also define the Sum of Extremal COefficient of the partition \(\{A,B\}\) by : \[ SECO_{A,B} = H_A(1\!\!1_A) + H(1\!\!1_B) - H(1\!\!1) \] where \(H_C\) is the exponent measure function of the extreme limit of the sub-vector \(X_C\).

With (Ferreira 2011), we can show that \(SECO_{A,B}=0\) if and only if we have \(Y_A \perp \!\! \perp_e Y_B\).

Motivation

For both situation, we can stress that there is a list of equivalent assertion which are following the same structure : extremal (or asymptomatic) independence, exponent measure decomposition, and coefficient equality.

So we can wonder, in the context of extremal conditional independence, if we can define a conditional exponent measure function \(H^C\) to have the relation :

\[ Y_A \perp \!\! \perp_e Y_B ~|~ Y_C \Longleftrightarrow \Big(\forall x=(x_A,x_B)>0 ,~H^C(x) = H^C_A(x_A) + H^C_B(x_B) \Big) \] where \(A,B,C\) is a partition of \(V\).

Furthermore, can we find some coefficient \(\theta_{AB}^C\) (calculable from data) and a constant \(c \in \mathbb R\) such that : \[ \theta_{AB}^C = c \Longleftrightarrow Y_A \perp \!\! \perp_e Y_B ~|~ Y_C \]

More I look a these relations, more I think it will be difficult to get… The previous relation of independence where strongly linked to the independence of the corresponding max stable distribution. But in our case, we can’t talk about conditional independence for the max stable distribution because it does not exist (I mean, it is equivalent to the classical independence).

If I really want to build these relations, I think we will need to use or create new notions, which ideally can be linked to the previous ones (exponent measure, etc…).

The conditional extremal coefficient \(\chi_{ij}^A\)

The aim of the section is to present a new coefficient for extreme graphical models and try to link the general case to the univariate one, which is simpler to compute. As above, let \(X\) a random \(d\)-dimensional vector where the marginal distribution are standard Frechet.

The trivariate coefficient \(\chi_{ijk}\)

We define the trivariate extremal coefficient as :

\[ \chi_{ijk} = \lim_{q \rightarrow 1}\mathbb P(F(X_i)>q,F(X_j)>q~|~F(X_k)>q) = \lim_{q \rightarrow 1}\chi_{ijk}(q) \]

if the right-hand term exists.

In that case, if the vector \(X\) is in the attraction domain of \(Y\), the corresponding generalized Pareto distribution, then we also get :

\[\begin{equation}\label{eq:chi_pareto} \chi_{ijk} = \mathbb P(Y_i>1,Y_j>1~|~Y_k>1) \end{equation}\]

We would like to use this coefficient in order to caractherize the conditional independence (or dependence) between the component in a similar sense of the extremal conditional independence (see (Engelke and Hitz 2020)).

Case where \(A = \{k\}\)

By analogy on the extremal coefficient \(\chi_{ij}\), we define the conditional extremal coefficient by :

\[ \chi_{ij}^k = \lim_{q \rightarrow 1}\mathbb P(F(X_i)>q ~|~ F(X_j)>q, F(X_k)>q) = \frac{\chi_{ijk}}{\chi_{jk}} \] in the same setting as above.

Of course, the equation in the last section gives us another way to compute this quantity using the corresponding generalised Pareto distribution \(Y\) :

\[ \chi_{ij}^k = \mathbb P(Y_i>1~|~Y_j>1, Y_k>1) \]

Definition for any \(A\)

Now, let’s define for some \(A \subset V\).

Let \(i,j \in V\), then the conditional extremal coefficient is defined as :

\[ \chi_{ij}^A = \lim_{q \rightarrow 1}\mathbb P(F(X_i)>q ~|~ F(X_j)>q, ||F(X_A)||_{\infty}>q) = \lim_{q\rightarrow 1}\chi_{ij}^A(q) \]

where \(F(X_A) = (F(X_k))_{k \in A}\), if the right-hand limit exists.

We could use different extension for a general subset \(A\) of \(V\) as for example \(\lim_{q \rightarrow 1}\mathbb P(F(X_i)>q ~|~ F(X_j)>q, F(X_A)>q)\), but let’s focus on the first one before.

First, we can write another expression for this coefficient, similar to the previous section, which is : \[ \chi_{ij}^A=\mathbb P(Y_i>1~|~Y_j>1, ||Y_A||_\infty > 1) \] (the demonstration is the same as the case where \(A = \{k\}\))

More, we have some inequalities between the conditional coefficient \(\chi^A_{ij}\) and all sub-coefficient \(\chi^k_{ij}\) for \(k \in A\).

Proposition. Let \(A \subset V\) and \(i, j \notin A\). Then we have :

  • \(\chi^k_{ij} \le |A| \chi^A_{ij}\) for all \(k \in A\).

  • \(\chi^A_{ij} \le \sum_{k\in A} \chi^k_{ij}\).

In particular, we have thanks to these inequalities, the equivalence :

\[ \Big(\forall k \in A,~ \chi^k_{ij} = 0 \Big) \Longleftrightarrow \chi^A_{ij} = 0 \]

An appreciated properties will be that a conditional independence between two nodes occurs if and only if, \(\chi^A_{ij}=0\) with \(A= V\setminus \{i,j\}\) and so \(\chi^k_{ij} = 0\) for all \(k \neq i,j\).

Simulation study for the Hüsler-Reiss model

We would like to test the last assumption in the case of a Hüsler-Reiss model, where the conditional independence can be verified easily. For \(d =3\) for example, let’s define the variogram matrix of our Hüsler-Reiss model :

\[ \Gamma = \begin{pmatrix} 0 & a&b \\ a & 0 & c \\ b & c & 0 \end{pmatrix} \]

Then, thanks to the extended precision matrix \(\Theta\) defined in (Hentschel, Engelke, and Segers 2023) we have that :

  • \(Y_1 \perp \!\! \perp Y_2 | Y_3 \Longleftrightarrow a=b+c\)

  • \(Y_1 \perp \!\! \perp Y_3 | Y_2 \Longleftrightarrow b=a+c\)

  • \(Y_2 \perp \!\! \perp Y_3 | Y_1 \Longleftrightarrow c=a+b\)

More, in the Hüsler-Reiss model, the extremal coefficient (bivariate an trivariate) can be computed fastly using :

  • \(\chi_{ij} = 2-2\Phi(\sqrt{\Gamma_{ij}}/2)\)

  • \(\chi_{ijk} = 3-2\Phi(\sqrt{\Gamma_{ij}}/2)-2\Phi(\sqrt{\Gamma_{jk}}/2)-2\Phi(\sqrt{\Gamma_{ik}}/2)+ \sum_{k=1}^3 \Phi_2((\sqrt{\Gamma_{ik}}/2)_{i\neq k}; A_k\Sigma^{(k)}A_k)\) with \(A_k = \text{diag}((\sqrt{\Gamma_{ik}}^{-1})_{i\neq k})\) and where \(\Phi\) is the distribution function of a standard Gaussian variable and \(\Phi_2(.;\Sigma)\) the distribution function of a gaussian vector of covariance matrix \(\Sigma\).

We produce a random matrix restricted by the relation \(a=b+c\) (to get a graph with no node linking \(1\) and \(2\)). The coefficients \(b\) and \(c\) are simulated with by an uniform distribution on \([0,5]\).

tar_load(Gamma_cond_indep_12)       # load our simulated variogram
print(Gamma_cond_indep_12)
##         [,1]    [,2]    [,3]
## [1,] 5.75331 4.73967 1.01364

Of course, we have to check the conditional negative definiteness of the matrix (to know if the matrix is a possible Hüsler-Reiss variogram). In order to do that we use the graphicalExtremes.

checkGamma(to_matrix(Gamma_cond_indep_12), returnBoolean = T)    # function checking the validity 
## [1] TRUE

Now, we are able to compute the trivariate coefficient, and consequently check if this is zero or not.

tar_load(chi_trivariate_example)    # load the value of the trivariate coef for our matrix
print(chi_trivariate_example)
## [1] 0.2001413

So finally it is not 0…

This is not very surprising because the condition that we wanted is about a coefficient which is totally symmetric between \(i\),\(j\) and \(k\), so the conditional independence (in the \(d=3\) case at least) of two implies the conditional independence for all and then the graph is empty.

But, let’s check another thing. We are going to simulate several matrix and see what is the trend for the value of the conditional extremal coefficient.

Simulation with \(n\) replicates

Here, let’s check for our example for a tri-dimensional Hüsler-Reiss graphical model with no edge between nodes 1 and 2 if there we can tell something else with different variogram. For our setup :

  • simulate \(n=2000\) matrices

  • the coefficient \(b\) and \(c\) are produced by a uniform random variable in the interval \([0,25]\).

For the conditional extremal coefficient, we get these results :

Conditional coefficient computed with the simulations

Conditional coefficient computed with the simulations

We can not have any conclusion with such figures… Perhaps, we notice a concentration near \(0,1\) but nothing else. If we focus our attention in the trivariate coefficient, we will obtain the following figure :

Trivariate coefficient computed with the simulations

Trivariate coefficient computed with the simulations

Again, we can not say anything, even if the concentration around 0 is a little bit stronger. We can explain this by the fact that the trivariate coefficient is lower that the conditional one.

However, under the conditional independence assumption, one can notice that :

\[ \chi_{123} = \mathbb P(Y_1>1, Y_2 > 1 ~ | ~Y_3 > 1) = \mathbb P(Y_1>1 ~ | ~Y_3 > 1)\mathbb P(Y_2 > 1 ~ | ~Y_3 > 1) = \chi_{1,3} \chi_{2,3} \] and so that the quotient coefficient \(\frac{\chi_{123}}{\chi_{1,3} \chi_{2,3}} = 1\).

Finally, it is not… But there is what I’ve done which indicate the equality is false. I will explain why below.

Let’s see if it is the case for our previous simulations. We obtained :

Computation of the quotient coefficient (left, the expected value 1 is the grey dashed line) and gap between the two coefficients (right).

Computation of the quotient coefficient (left, the expected value 1 is the grey dashed line) and gap between the two coefficients (right).

We can notice that is not really what we expected… It seems that there is some error during the computation, but if we try with others values for the coefficient of the variogram that there are close to zero, we notice better more accurate results as it should be :

Quotient value for each possible matrix when conditional independence

Quotient value for each possible matrix when conditional independence

Perhaps, there is an effect of the fact that our values are very small. To see this effect, let’s build the absolute and relative error for the coefficient computation and see what we can say.

Different error types for each possible matrix when conditional independence

Different error types for each possible matrix when conditional independence

Actually, we made a mistake. The relation of independence we wrote above was a pure independence relationship, not conditional dependence… With another notation it corresponds to :

\[ \mathbb P(Y_1^3>1, Y_2^3 >1) =\mathbb P(Y_1^3>1)\times \mathbb P (Y_2^3 >1) \] which is true when \(Y_1^3 \perp \!\! \perp Y_2^3\), not when \(Y_1^3 \perp \!\! \perp Y_2^3 ~ | ~ Y_3^3\).

To get the relation, except using the factorization of the density, I don’t know really how to formulate the conditional independence using probabilities because the notion of conditional probabilities given a random variable is very weird and not that intuitive (because it’s conditional given a particular \(\sigma\)-algebra)…

Link between trivariate and bivariate

We would like to express the trivariate coefficient in terms of bivariate extremal coefficient and \(SECO\) in the case where \(d=3\). Let’s recall that we can show :

\[ \chi_{123} = 3 - \Lambda_{12}(1,1) -\Lambda_{13}(1,1)-\Lambda_{23}(1,1)+\Lambda_{123}(1,1,1) \]

Then we have :

\[\begin{align*} \chi_{123} &= 3 - \Lambda_{12}(1,1) -\Lambda_{13}(1,1)-\Lambda_{23}(1,1)+\Lambda_{123}(1,1,1) \\ & = 1 + \chi_{12} - \Lambda_{13}(1,1)-\Lambda_{23}(1,1)+\Lambda_{123}(1,1,1) \\ & = \chi_{12} + \chi_{13}-(1 + \Lambda_{23}(1,1)-\Lambda_{123}(1,1,1)) \\ & = \chi_{12} + \chi_{13}-SECO_{\{1\}, \{2,3\}} \\ \end{align*}\]

In the same way, we can also write :

\[ \chi_{123} = \chi_{12} + \chi_{23}-SECO_{\{2\}, \{1,3\}} =\chi_{13} + \chi_{23}-SECO_{\{3\}, \{1, 2\}} \]

Scatter plot of the trivariate coefficient over the three bivariate coefficient.

Scatter plot of the trivariate coefficient over the three bivariate coefficient.

We can also represent the general scatter plot overall :

This shape is easy to explain. Indeed, let’s take a look at the expression of the bivariate coefficient in the case where we have conditional independence between 1 and 2 knowing 3 :

\[ \chi_{12} = 2-2\Phi(\sqrt{b+c}/2),\quad \chi_{13} = 2-2\Phi(\sqrt{b}/2), \quad \chi_{23} = 2-2\Phi(\sqrt{c}/2) \]

Now, let’s consider the mapping \(g: x \mapsto 2-2\Phi(\sqrt{x}/2)\) pour \(x > 0\). Then, it is a bijection between \(\mathbb R_+^*\) and \([0,1]\). so we can find the inverse of this application, denoted by \(g^{-1}\). Therefore, we can write :

\[ \chi_{12} = g(b+c) = f(\chi_{13}, \chi_{23}) \] with \(f(x,y) = g(g^{-1}(x) + g^{-1}(y))\).

Thus, the scatter plot \((\chi_{12}, \chi_{13}, \chi_{23})\) is a parameterised surface.

More, when we plot the scatter plot between one bivariate coefficient and the trivariate coefficient, the paramaterized representation can explain the difference between the first plot and the others.

Actually, the trivariate coefficient depends of two parameters \(b\) and \(c\) that we can find in the variogram matrix \(\Gamma\), and this is also the case for \(\chi_{12}\). However, \(\chi_{13}\) and \(\chi_{23}\) are parameterised by \(b\) and \(c\) respectively (so is the image of a uni-dimensional function).

The question is : the shape of this scatter plot is due to the fact that there exist a conditional independence between the components, or just to mathematical properties induce by the bivariate representation of both coefficient ?

Let’s take now a \(3 \times 3\) variogram matrix such that : \[ a = b+10c \] (to keep the bivariate representation and loose the conditional independence).

Scatter plot of the trivariate coefficient over the three bivariate coefficient.

Scatter plot of the trivariate coefficient over the three bivariate coefficient.

Well, even if we have a better fitting between the scatter plot and the line \(y=x\), we can notice there is an inequality that is always verified between the trivariate coefficient and the bivariate ones. It is :

\[ \chi_{123} \le \chi_{12} \wedge \chi_{13} \wedge \chi_{23} \] that we can call the tri-bivariate inequality, and is true.

Proofs

Alternative writing for the trivariate coefficient

We will show that we can compute the trivariate coefficient with the formula : \[ 3 - \Lambda_{ij}(1,1) -\Lambda_{ik}(1,1)-\Lambda_{jk}(1,1)+\Lambda_{ijk}(1,1,1) \] and the two writing can be expressed with the latter.

First, let’s take a look to the limit side :

\[\begin{align*} \chi_{ijk}(q) =& \frac{1}{1-q} \big[1-3q +\mathbb P(F(X_i)<q,F(X_j)<q)+\mathbb P(F(X_i)<q,F(X_k)<q) \\ & +\mathbb P(F(X_j)<q,F(X_k)<q)-\mathbb P(F(X_i)<q,F(X_j)<q,F(X_k)<q)\big]\\ = & 3 - \frac{1-\mathbb P(F(X_i)<q,F(X_j)<q)}{1-q}- \frac{1-\mathbb P(F(X_i)<q,F(X_k)<q)}{1-q}\\ & - \frac{1-\mathbb P(F(X_j)<q,F(X_k)<q)}{1-q} + \frac{1-\mathbb P(F(X_i)<q,F(X_j)<q,F(X_k)<q)}{1-q}\\ = & 3 -\frac{1}{F^{-1}(q)}\big[\frac{F^{-1}(q)(1-\mathbb P(X_{ij}<F^{-1}(q)))}{1-q}- \frac{F^{-1}(q)(1-\mathbb P(X_{ik}<F^{-1}(q)))}{1-q}\\ & - \frac{F^{-1}(q)(1-\mathbb P(X_{jk}<F^{-1}(q)))}{1-q} + \frac{F^{-1}(q)(1-\mathbb P(X_{ijk}<F^{-1}(q)))}{1-q}\big]\\ \end{align*}\]

But, as \(X\) is a random vector with standard Frechet marginals, \(X\) is a regular variation random vector, that means that :

\[ \lim_{u \rightarrow \infty} u(1- \mathbb P(X_C<u)) = \Lambda_C(1\!\!1_C) \] where \(\Lambda_C\) is the exponent measure of the marginal extreme limit of \(X_C\).

But, we have \(F^{-1}(q) \rightarrow \infty\) and \(-\log(q) \sim 1-q\) when \(q \rightarrow 1\) so we obtain :

\[ \lim_{q \rightarrow 1} \chi_{ijk}(q) =3 - \Lambda_{ij}(1,1) -\Lambda_{ik}(1,1)-\Lambda_{jk}(1,1)+\Lambda_{ijk}(1,1,1) \]

Now, let’s develop the other term. Using the equation (16) in (Engelke and Hitz 2020), if we note \(\lambda\) the density of the exponent measure with respect to the Lebesgue measure, we have :

\[ \mathbb P(Y_i>1, Y_j>1 ~|~Y_k>1) = \int_{[1,\infty[^3} \lambda(y)dy = \Lambda([1,\infty[^3) \]

The properties of valid density give us 3 equations that we can use to get that :

\[ 3 = \Lambda(]0,\infty[\times[1,\infty[^2) + \Lambda([1,\infty[\times]0,\infty[\times[1,\infty[) + \Lambda([1,\infty[^2\times]0,\infty[) \] These quantities can be decomposed as several “volume” in the space. We can merge some volume between them to obtain this relation : \[ 3 = \Lambda_{ij}(1,1) +\Lambda_{ik}(1,1)+\Lambda_{jk}(1,1)-\Lambda_{ijk}(1,1,1) + \Lambda([1,\infty[^3) \]

and we have the right formula by isolating the \(\Lambda([1,\infty[^3)\) term.

Inequalities of section 2

  • \(\chi^k_{ij} \le |A| \chi^A_{ij}\) :

Let \(k \in A\). We have by definition that :

\[\begin{align*} \mathbb P(F(X_i)>q~|~F(X_j)>q,F(X_k)>q) &= \frac{\mathbb P(F(X_i)>q,F(X_j)>q,F(X_k)>q)}{\mathbb P(F(X_j)>q,F(X_k)>q)} \\ &\le \frac{\mathbb P(F(X_i)>q,F(X_j)>q,||F(X_A)||_{\infty}>q)}{\mathbb P(F(X_j)>q,F(X_k)>q)} \\ &\le \chi^A_{ij}(q) \frac{\mathbb P(F(X_j)>q,||F(X_A)||_{\infty}>q)}{\mathbb P(F(X_j)>q,F(X_k)>q)} \\ &\le \chi^A_{ij}(q) \sum_{k\in A}\frac{\mathbb P(F(X_j)>q,F(X_k)>q)}{\mathbb P(F(X_j)>q,F(X_k)>q)} \\ &\le |A| \times\chi^A_{ij}(q) \end{align*}\]

If we take the limit of both side, we get : \[ \chi^k_{ij} \le |A| \times\chi^A_{ij} \] which shows the first inequality.

  • \(\chi^A_{ij} \le \sum_{k\in A} \chi^k_{ij}\) :

We use the alternative writing for this proof. If we apply again the definition, we have :

\[\begin{align*} \chi^A_{ij} &= \frac{\mathbb P(Y_i>1,Y_j>1, ||Y_A||_\infty >1)}{\mathbb P(Y_j>1, ||Y_A||_\infty >1)}\\ & \le \sum_{k \in A}\frac{\mathbb P(Y_i>1,Y_j>1, Y_k >1)}{\mathbb P(Y_j>1, ||Y_A||_\infty >1)} \\ & \le \sum_{k \in A}\chi^k_{ij}\frac{\mathbb P(Y_j>1, Y_k >1)}{\mathbb P(Y_j>1, ||Y_A||_\infty >1)} \\ \end{align*}\]

but as \(\mathbb P(Y_j>1, Y_k >1)\le \mathbb P(Y_j>1, ||Y_A||_\infty >1)\) for all \(k \in A\), we finally have :

\[ \chi^A_{ij} \le \sum_{k\in A} \chi^k_{ij} \] which is what we wanted.

Tri-bivariate inequality

Let’s show that :

\[ \chi_{123} \le \chi_{12} \]

the others can be show in the same way.

Let \(q \in (0,1)\), we have :

\[\begin{align*} \mathbb P(F(X_1)>q,F(X_2)>q~|~F(X_3)>q) &= \frac{\mathbb P(F(X_1)>q,F(X_2)>q,F(X_3)>q)}{\mathbb P(F(X_3)>q)} \\ &\le \frac{\mathbb P(F(X_1)>q,F(X_2)>q)}{\mathbb P(F(X_3)>q)} \\ &\le \frac{\mathbb P(F(X_2)>q)}{\mathbb P(F(X_3)>q)} \mathbb P(F(X_1)>q~|~F(X_2)>q) \\ &\le \mathbb P(F(X_1)>q~|~F(X_2)>q) \\ \end{align*}\]

If we take the limit of both side, we get : \[ \chi_{123} \le \chi_{12} \]

The same proof’s structure holds for the other bivariate coefficient, so we can deduce that : \[ \chi_{123} \le \chi_{12} \wedge \chi_{13} \wedge \chi_{23} \]

References

Engelke, Sebastian, and Adrien S. Hitz. 2020. “Graphical Models for Extremes.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82 (4): 871–932. https://doi.org/10.1111/rssb.12355.
Ferreira, H. 2011. “Dependence Between Two Multivariate Extremes.” Statistics & Probability Letters 81 (5): 586–91. https://doi.org/10.1016/j.spl.2011.01.014.
Hentschel, Manuel, Sebastian Engelke, and Johan Segers. 2023. “Statistical Inference for Hüsler-Reiss Graphical Models Through Matrix Completions.” arXiv. https://doi.org/10.48550/arXiv.2210.14292.
Strokorb, Kirstin. 2020. “Extremal Independence Old and New.” arXiv. https://doi.org/10.48550/arXiv.2002.07808.